Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model
نویسندگان
چکیده
Quantifiability is a concept in MapReduce Analytics based on the following two conditions: (a) a mapper should be cautious, that is, should not exclude any reducer's shuffle and sort strategy from consideration; and (b) a mapper should respect the reducers' shuffle and sort preferences, that is, should deem a reducer's shuffle and sort strategy ki infinitely more likely than k'i if it premises the reducer to prefer ki to k'i. A shuffle and sort strategy is quantifiable if it can optimally be chosen under common shuffle and sort conjecture in the events (a) and (b). In this paper we present an algorithm that for every finite MapReduce operation computes the set of all quantifiable shuffle and sort strategies. The algorithm is based on the new idea of a key-value preference limitation, which is a pair (ki, Vi) consisting of a shuffle and sort strategy ki, and a subset of shuffle and sort strategies Vi, for mapper i. The interpretation is that mapper i prefers some shuffle and sort strategy in Vi to ki. The algorithm proceeds by successively adding key-value preference limitations to the MapReduce.
منابع مشابه
Traffic Analysis in MapReduce
-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...
متن کاملAsymmetric Key-Value Split Pattern Assumption over MapReduce Behavioral Model
Actual Quantifiability is a concept in MapReduce that is based on two assumptions: (1) every mapper is cautious, i. e. , does not exclude any reducer's key-value split pattern choice from consideration, and (2) every mapper respects the reducer's key-value split pattern preferences, i. e. , deems one reducer's key-value split pattern choice to be infinitely more likely than anoth...
متن کاملOptimization and analysis of large scale data sorting algorithm based on Hadoop
When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of big data sorting is to use shuffle and sort phase in MapReduce based on Hadoop. However, if we use it directly, the efficiency could be very low and the loa...
متن کاملMapReduce with communication overlap (MaRCO)
MapReduce is a programming model from Google for cluster-based computing in domains such as search engines, machine learning, and data mining. MapReduce provides automatic data management and fault tolerance to improve programmability of clusters. MapReduce’s execution model includes an all-map-to-all-reduce communication, called the shuffle, across the network bisection. Some MapReductions mov...
متن کاملClash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics
MapReduce and Spark are two very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we evaluate the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a s...
متن کامل